GPU vs FPGA: A Comparative Analysis for Non-standard Precision
نویسندگان
چکیده
FPGAs and GPUs are increasingly used in a range of high performance computing applications. When implementing numerical algorithms on either platform, we can choose to represent operands with different levels of accuracy. A trade-off exists between the numerical accuracy of arithmetic operators and the resources needed to implement them. Where algorithmic requirements for numerical stability are captured in a design description, this trade-off can be exploited to optimize performance by using high-accuracy operators only where they are most required. Support for half and double-double floating point representations allows additional flexibility to achieve this. The aim of this work is to study the language and hardware support, and the achievable peak performance for non-standard precisions on a GPU and an FPGA. A compute intensive program, matrix-matrix multiply, is selected as a benchmark and implemented for various different matrix sizes. The results show that for large-enough matrices, GPUs out-perform FPGAbased implementations but for some smaller matrix sizes, specialized FPGA floating-point operators for half and double-double precision can deliver higher throughput than implementation on a GPU.
منابع مشابه
Accelerated BLAST Performance with Tera-BLASTTM: a comparison of FPGA versus GPU and CPU BLAST implementations
A number of technologies have emerged for accelerating similarity search algorithms in bioinformatics, including the use of field programmable gate arrays (FPGA), graphics processing units (GPU), and clusters of standard multicore CPUs. Here we present Tera-BLASTTM, an FPGA-accelerated implementation of the BLAST algorithm, and compare the performance to GPU-accelerated BLAST and the industry s...
متن کاملFPGA vs GPU Performance Comparison on the Implementation of FIR Filters
FIR filters find place in digital signal processing applications that require stopping a frequency band while passing another band or removing noise. Due to the complex structure and parallelism property of FIR filters, dedicated reconfigurable hardware are preferred for implementation rather than CPUs. Recently, GPGPU emerged as an effective technique for solving computation-intensive problems...
متن کاملFPGA-based of Thermogram Enhancement Algorithm for Non-destructive Thermal Characterization
متن کامل
A Comparative Study on Instrumental Precision of Refrigerated and Non-Refrigerated Auto-Analyzers in Order to Improve Quality Assurance in Biochemistry Laboratory
Background and Objective: Quality control is one of the most important components in order to improve quality assurance in laboratories during analytical steps. For this purpose, coefficient of variation plays an important role. Due to the fast improvement in technology, application of inferential statistics for the comparisons ...
متن کاملA High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem
Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ∼ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with highbandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014